Comprehensive lipid and lipid-related gene investigations of host immune responses to characterize metabolism-centric biomarkers for pulmonary tuberculosis

Despite remarkable success in the prevention and treatment of tuberculosis (TB), it remains one of the most devastating infectious diseases worldwide. Management of TB requires an efficient and timely diagnostic strategy. In this study, we comprehensively characterized the plasma lipidome of TB patients, then selected candidate lipid and lipid-related gene biomarkers using a data-driven, knowledge-based framework. Among 93 lipids that were identified as potential biomarker candidates, ether-linked phosphatidylcholine (PC O–) and phosphatidylcholine (PC) were generally upregulated, while free fatty acids and triglycerides with longer fatty acyl chains were downregulated in the TB group. Lipid-related gene enrichment analysis revealed significantly altered metabolic pathways (e.g., ether lipid, linolenic acid, and cholesterol) and immune response signaling pathways. Based on these potential biomarkers, TB patients could be differentiated from controls in the internal validation (random forest model, area under the curve [AUC] 0.936, 95% confidence interval [CI] 0.865–0.992). PC(O-40:4), PC(O-42:5), PC(36:0), and PC(34:4) were robust biomarkers able to distinguish TB patients from individuals with latent infection and healthy controls, as shown in the external validation. Small changes in expression were identified for 162 significant lipid-related genes in the comparison of TB patients vs. controls; in the random forest model, their utilities were demonstrated by AUCs that ranged from 0.829 to 0.956 in three cohorts. In conclusion, this study introduced a potential framework that can be used to identify and validate metabolism-centric biomarkers.

www.nature.com/scientificreports/ collected. One half was used for assessments in positive ion mode, and the other half was used for assessments in negative ion mode. The lipid fraction was completely dried in a vacuum at room temperature and stored at − 80 °C until needed.
Instrumental conditions for untargeted lipid profiling. An Acquity charged surface hybrid technology C18 column (2.1 × 100 mm, 1.7 μm) and Acquity VanGuard charged surface hybrid technology C18 pre-column (2.1 × 5 mm, 1.7 μm) were used for lipid separation with a binary gradient elution as described in Supplementary Table S2. A Shimadzu Nexera LC system (Kyoto, Japan) was utilized for the experiment. Lipid extracts were resuspended in methanol/toluene (9:1, v/v) and kept at 4 °C in an autosampler. The injected volume was ion-mode-and data-acquisition-dependent. From 200 μL of resuspended volume, 1 μL (scan profiling) and 2 μL (information-dependent acquisition and SWATH-based data-independent acquisition) were injected in positive ion mode; 3 μL (scan profiling) and 6 μL (information-dependent acquisition and SWATH) were injected in negative ion mode. The separated lipid ions were analyzed using an X500R QTOF with a Turbo V™ ion source with a TwinSpray probe (SCIEX, MA, USA). For the tandem MS analyses, either 45 eV (spread of 15 eV) or 25 eV (spread of 15 eV) were used. The MS parameters are shown in Supplementary Table S3. Mass calibration was automatically performed after every fifth injection through the instrument's CDS system, using X500R positive or negative calibration solutions.
Lipid data processing, alignment, and lipid annotation. Raw data (wiff files) were directly input to MS-DIAL (version 4.8) for data processing, alignment, and lipid identification. The parameters were ion mode-dependent, as described in Supplementary Table S4. The aligned data were exported for subsequent use. Post-alignment data processing was performed using MetaboAnalyst 5.0 and features with missing rates ≥ 50% were removed; otherwise, the k-nearest neighbors algorithm was used to impute the missing features 35 . Features with relative standard deviation of ≥ 25% in the pooled QC were also removed. The MS-DIAL inbuilt library and Fiehn's lab lipidomics library were used for lipid identification 36,37 .
Transcriptomics data processing and differential analysis. Raw counts of transcripts mapped into genes were summarized using the sum level. The annotated gene-level raw counts were normalized using Trimmed Mean of M-values. The pipeline was implemented using NetworkAnalyst 3.0 38 . Differentially expressed analysis was applied for lipid-related genes in the three transcriptome profiles (i.e., E-MTAB-8290, GSE107991, GSE101705) using two-sided unpaired t-test (rstatix package version 0.7.0, implemented in R 4.1.2). Genes with a false discovery rate (FDR) less than 0.05 were considered as significant.
Data exploration and visualization. An unsupervised method, principal component analysis (PCA), was employed to explore and visualize the lipidome data. Prior to the analysis, the data were normalized (using the median method), log-transformed, and Pareto scaled. PCs that explained the most sample variance were plotted in a two-dimensional space (MetaboAnalyst 5.0) or three-dimensional space (R package, Plotly version 4.10.0). Heatmap and volcano plots (MetaboAnalyst 5.0) were also used for data visualization.
Statistical analysis and modeling of lipidomic data. Prior to univariate analysis using an unpaired t-test, the data were normalized using the median method and log-transformed. An FDR of 0.05 was set as the threshold for significant features. Fold-change (FC) thresholds of 1.2, 1.5, and 2 were also tested for biomarker candidate selection. Class discrimination between the lipid profiles of the two groups was achieved using partial least squares-discriminant analysis (PLS-DA). Because the discriminant model has tuning parameters (e.g., the number of components), the optimal model was selected in a tenfold cross-validation process. The variable importance in projection (VIP) score of the PC1 of the optimal model was set at ≥ 1.2 as the threshold of important features used to detect potential biomarker candidates. Statistical analyses were conducted using Metabo-Analyst 5.0 unless stated otherwise.
Internal and external biomarker validation. Univariate receiver operating characteristic (ROC) curve analysis was conducted to examine the potential biomarker applications of individual lipids. Random forest and linear support vector machine (SVM) were carried out to investigate the discriminatory capacity of the lipid biomarker candidates. Random forest is an ensemble method that generates many decision trees, then aggregates their outcomes to obtain greater prediction accuracy 39 . This powerful tool uses bagging and random feature selection to build multiple base learners. In SVM, a hyperplane is identified that maximizes the margin from data points. A larger margin leads to greater separation by the hyperplane, thus reducing generalization error. The performances of random forest and SVM are stable, regardless of the domain and data types. For internal validation, the ROC curve-based exploratory analysis was utilized because it can automate important feature identification and performance evaluation. In the external validation, the biomarkers that were overlapped with the quantified lipids in the data of Cho et al. (at the fatty acyl/alkyl sum composition) were used to validate their performance in classifying TB patients from latent infection and controls 21 . All matched biomarkers were used to establish the biomarker models. The analyses were carried out in three different scenarios: TB vs. LTBI + control, TB vs. LTBI, and TB vs. control. The biomarker models using lipid-related genes in the datasets E-MTAB-8290 (54 TB, 127 control/non-TB), GSE107991 (21 TB, 12 controls, 21 LTBI), and GSE101705 (28 TB, 16 LTBI) were trained and validated using the same approach. In particular, the gene expression of matched lipid-related genes in four different data sets were utilized for the ROC-curve-based exploratory analysis. In all www.nature.com/scientificreports/ analyses, the area under the curve (AUC) and 95% confident interval (CI) of the best models are reported. The analyses were performed in the "Biomarker Analysis" module of MetaboAnalyst 5.0. Functional analysis. Biomarker candidate data were submitted to Lipid Ontology (LION) for lipid ontology enrichment analysis via the "LION-PCA heatmap" module 40 . In addition, lipid-gene association networks were analyzed using Lipidsig and lipid-genes were extracted 41 . For visualization, the R package ggplot2 (version 3.3.5) was used.

Results
Lipid profiles of TB patients are distinguishable from lipid profiles of controls. PCA was performed in positive ion mode to explore sample tendencies independent of sample source. The analysis was conducted using a total of 3791 detected lipid features of TB and control samples, with and without QC samples.
In the PCA scores plot with QC samples, all QC samples clustered tightly together (Fig. S1A), and the relative standard deviation of the raw total ion chromatogram among QC samples was only 6.5%. These data indicated satisfactory repeatability of the untargeted lipid profiling analysis, which allowed subsequent data analyses and interpretation. In the PCA scores plot with TB and control samples, the three first PCs explained 52.1% of the variance: 23.2%, 21.1%, and 7.8% for PCs 1, 2, and 3, respectively. The relative separation of samples into two separate groups is evident in the three-dimensional PCA plot (Fig. 1A). Heatmap analysis captured relative differences between the two groups at the feature level; differences in lipid features were relatively clear (Fig. 1B). In addition, PCA analysis, which included 762 detected features, were also conducted in negative ion mode. Similar to positive ion mode, the QC samples clustered together (relative standard deviation of total ion chromatogram: 5.6%, Fig. S1B). The three-dimensional PCA plot indicated relative separation of the samples into two groups (Fig. 1C). At the feature level in the heatmap, we could also notice a proportionately contrast between the two groups ( Fig. 1D). Taken together, the data exploration analyses in positive and negative ion modes indicated considerable differences between the lipid profiles of TB patients and of controls.
Univariate and multivariate analyses suggest numerous lipid biomarker candidates. Data exploration indicated considerable differences in lipid metabolic profiles between TB patients and controls, but a more sophisticated statistical approach was needed to identify lipids that could be regarded as biomarker candidates. Supervised investigation was conducted using PLS-DA and the lipid profiles of TB patients and controls. In positive ion mode, a PLS-DA model with five components classified the two groups with appropriate performance metrics ( Fig The intersection of two criteria, a VIP score (PC1, PLS-DA model) of ≥ 1.2 and an FC (t-test) of ≥ 1.5, revealed 89 and 28 potential biomarker candidates in positive and negative ion modes, respectively. Among the selected features, 73 (positive ion mode) and 26 (negative ion mode) were successfully annotated as lipids, thus yielding 93 non-overlapping lipid biomarkers (Table 1).
Internal and external validation indicate satisfactory performance of lipid biomarker candidates. Annotated lipid biomarker candidates were first subjected to univariate biomarker analysis. The ROC curves for those candidates were significantly associated with the TB status (Supplementary Table S5). Among 93 candidate lipid biomarkers, 21 had AUC values < 0.7, whereas 72 were considered promising (AUC ≥ 0.7); of the 72, 13 were considered good (AUC ≥ 0.8) and 2 were considered excellent (AUC > 0.9). The "excellent" lipid biomarkers were two ether- Multivariate biomarker analysis using the random forest method revealed that models with 93 variables had the best performance (AUC = 0.921, 95% confidence interval [95% CI] 0.834-0.987) (Fig. 3A). The result of the linear SVM method was approximately similar to the result of the random forest method (Fig. 3B). Correlation analysis showed a significant linear correlation among biomarkers for both TB patients (Fig. 3C) and controls (Fig. 3D), suggesting that a small number of lipids could be used as biomarkers to differentiate TB patients from controls.
To rule out the possibility that the internal validation overestimated candidate biomarker performance, external validation was conducted using the dataset from Cho et al.

Functional analysis reveals profound lipid metabolic alterations in TB patients. The LION
ontology results indicated that PC(O-), PC, and PE were generally enriched in the TB group, whereas FAs and triacylglycerols (TAGs) with longer acyl chains were downregulated. The TB group also exhibited enrichment of lipids associated with mitochondrion, endoplasmic reticulum, and membrane components; it showed decreased levels of LD-related lipid species (Fig. 4A).
The reported biomarker candidates belonged to 15 (sub)-classes, of which six contained ≥ 3 lipid species. Those six sub-classes were subjected to lipid-gene analysis to identify TB-associated functional dysregulation and potential gene biomarkers. As shown in Fig. 4B, numerous pathways were altered. Significantly enriched biological processes included the PI3K-Akt, Rap1, calcium, and chemokine signaling pathways. Ether lipid metabolism, fat digestion and absorption, linolenic acid metabolism, and cholesterol metabolism were also altered. The full list of altered pathways and associated genes is provided in Supplementary Table S6.
Lipid-genes are excellent biomarkers for differentiating active TB from its counterparts. Among dysregulated pathways detected in the lipid-gene analysis (p < 0.01), 162 unique genes were identified. These genes were tested for their ability to differentiate active TB from LTBI or controls in three different data sets. The random forest classifier established from the expression of lipid-related genes was able to distinguish TB from its counterparts in three different TB cohorts, with an AUC ranging from 0.829 (95% CI 0.707-0.931, E-MTAB-8290) to 0.958 (95% CI 0.909-1, GSE101705) ( Fig. 5A-D). The linear SVM model showed similar results (Fig. S4A-D). However, most genes exhibited a small FC.

Discussion
This study demonstrated differences in the lipidomes of TB patients and non-TB controls; it revealed 73 and 26 potential biomarker candidates, identified in positive and negative mode, respectively (six biomarkers were detected in both). In TB patients, the biomarkers had at least a 1.5-fold difference (in either direction). Among the significantly altered lipid sub-classes, ceramide (Cer), LPC, PC(O-), and PE were generally upregulated; certain PCs, diacylglycerols, and FAs were downregulated in TB patients. TAGs with shorter acyl chains were strongly increased in TB patients, while TAGs with longer acyl chains were decreased. The biomarkers mostly belonged to the lipid classes PC, PC(O-), PE, PE(O-), FA, and TAG, suggesting that these lipid classes are important in TB pathophysiology.
Among the putative lipid biomarkers, 12 were matched with previously reported lipid profiles that utilized a targeted approach 21  Lipid-related genes were associated with various TB pathologically comparable pathways; they also formed a distinctive signature that differentiated active TB from LTBI and other non-TB controls. Although gene expression was only subtly altered, it provided a consistent signature. Among the 162 lipid-related genes, 96 genes were  www.nature.com/scientificreports/   www.nature.com/scientificreports/ A significant increase in Cer(d34:1) was identified in TB patients. This biomarker has been previously reported to exhibit consistently higher levels in TB patients than in healthy individuals or patients with other respiratory diseases 23,24,42,43 . Shivakoti et al. showed that Cer(d34:1) is also related to TB treatment outcome; patients with the highest Cer levels had an increased risk of treatment failure 27 . The involvement of Cer in host immune responses against Mtb-via immune cell activation, phagocytosis, and other mechanisms-might explain its higher level in TB patients than in controls 24,44,45 .
Although Mtb relies on different carbon sources at different stages of pathogenesis, host lipids are generally the primary carbon source for Mtb in vivo 46 . The genome of Mtb laboratory strain H37Rv has > 250 genes related to lipid metabolism 47 . The infection of macrophages by Mtb triggers the formation of foamy macrophages through the accumulation of lipid bodies of TAGs and cholesterol esters 48,49 . Consistent with previous findings 23, 26 , our study showed a decrease in TAGs in host plasma; this may be related to the uptake of host TAGs into foamy macrophages to form LDs, which can serve as nutrient sources 14,48 . While LD formation may be a host-driven immune response rather than an Mtb-mediated process 19 , the resulting physiological changes in Mtb lead to TAG accumulation, LD formation, growth reduction, decreased metabolic activity, and development of phenotypic drug resistance; these processes are associated with the persistent and non-dividing stages of Mtb 50 . We also found the downregulation of FAs in TB patients; this hinders the formation of longer-chain FAs that form the main components of Mtb cell wall lipids 51 . Fas I/II-induced elongation could partially explain the decrease in plasma FAs.
Lysophosphatidylcholine acyltransferase 2 (LPCAT2) induces LD accumulation in cancer patients during the onset of chemoresistance 52 . Our study is presumably the first to report an association of the upregulation of LPCAT2 with metabolic alterations in TB patients vs. non-TB controls, suggesting that LPCAT2 can serve as a biomarker in the diagnosis of TB.
There is also emerging evidence concerning the crucial role of metabolism in host-pathogen dynamics, with the transcription factor PPAR (peroxisome proliferator-activated receptor) implicated in LD buildup during inflammation and infectious diseases [53][54][55] . Our analysis demonstrated the enrichment of several lipid-related genes associated with PPAR signaling pathways; these genes include PPARA , CD36, FABP4, and ACSL1 56 . Lipid mediators, cytokines, and chemokines may act in a paracrine manner to induce LD formation 14,57 .
We also identified the involvement of lipids and lipid-related genes in chemokine signaling (CDC42, FGR, IKBKB, RAC1), ether lipid metabolism, glycerophospholipid metabolism, sphingolipid metabolism, and phospholipase D signaling pathways. In a mouse model of TB 58 and a study of T cells from TB patients 59 , Mtb was found to inhibit host proinflammatory cytokine production through the PI3K-Akt signaling pathway. In the ontology analysis of lipid genes, the PI3K-Akt signaling pathway exhibited the most significant functional www.nature.com/scientificreports/ dysregulations in TB patients. Together, these observations support a significant role of lipid metabolism and lipid-related genes in the host immune response. Our study had several limitations. First, its focus was on the discovery and validation of lipid biomarkers; however, there is evidence to support the use of hydrophilic metabolites (e.g., glutamic acid and glutamine) as biomarkers 21 . The combined use of these metabolites and lipids would significantly improve the detection of active TB in clinical settings. Second, other infectious respiratory diseases (e.g., community-acquired pneumonia) were not included in the lipidomics analysis. Nevertheless, some markers were able to reliably distinguish TB and LTBI. Subsequent studies should examine the differential diagnostic performance of those biomarker candidates in other infectious respiratory diseases. Third, quantitative information is available for some biomarker candidates, based on isotopically labeled internal standards at ratios relative to human plasma. However, an accurate quantification strategy (e.g., AdipoAtlas 60 ) is needed to facilitate clinical application. This can be readily achieved through targeted analysis of a subset of the most promising biomarkers. Fourth, through our exploratory analysis, we enable the identification of several altered lipids and lipid genes, as well as lipid-related metabolism and immune response pathway in TB patients. Experimental studies on in vitro or animal models are required to substantiate our findings. Finally, a prospective validation cohort study with actual concentrations of lipid biomarkers is required to examine the relevance of the identified biomarkers in TB manifestations.
In summary, our study identified and validated lipid-focused biomarkers. Multiple data mining methods with lipidome and lipid-related transcript signatures were used to obtain robust biomarkers and gain new mechanistic insights into TB. Lipid species that belonged to the PC(O-), PCs, TAGs, FAs, and Cer were identified as excellent candidate biomarkers. PC(O-40:4), PC(O-42:5), PC(36:0), and PC(34:4) were externally validated and had a good performance. Additionally, our study revealed systemic and multi-omics levels of biologically relevant processes involved in host responses to Mtb infection. Overall, comprehensive omics analyses employing a data-driven, knowledge-based approach can support metabolism-centric biomarker discovery and validation.

Data availability
The data underlying this article cannot be shared publicly for the privacy of individuals that participated in the study. Pre-processed and imputed data will be shared by the corresponding author upon reasonable requests.